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Abstract. Given a fixed matrix, the problem of column subset selection requests a column subma- 
trix that has favorable spectral properties. Most research from the algorithms and numerical linear 
algebra communities focuses on a variant called rank-revealing QR, which seeks a well-conditioned 
collection of columns that spans the (numerical) range of the matrix. The functional analysis lit- 
erature contains another strand of work on column selection whose algorithmic implications have 
not been explored. In particular, a celebrated result of Bourgain and Tzafriri demonstrates that 
each matrix with normalized columns contains a large column submatrix that is exceptionally well 
conditioned. Unfortunately, standard proofs of this result cannot be regarded as algorithmic. 

This paper presents a randomized, polynomial-time algorithm that produces the submatrix 
promised by Bourgain and Tzafriri. The method involves random sampling of columns, followed 
by a matrix factorization that exposes the well-conditioned subset of columns. This factorization, 
which is due to Grothendieck, is regarded as a central tool in modern functional analysis. The 
primary novelty in this work is an algorithm, based on eigenvalue minimization, for constructing 
the Grothendieck factorization. These ideas also result in a novel approximation algorithm for the 
(oo,l) norm of a matrix, which is generally NP-hard to compute exactly. As an added bonus, 
this work reveals a surprising connection between matrix factorization and the famous MAXCUT 
semidefinite program. 
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1. Introduction 

Column subset selection refers to the challenge of extracting from a matrix a column submatrix 
that has some distinguished property. These properties commonly involve conditions on the spec- 
trum of the submatrix. The most familiar example is probably rank-revealing QR, which seeks a 
well-conditioned collection of columns that spans the (numerical) range of the matrix [GE96] . 

The literature on geometric functional analysis contains several fundamental theorems on column 
subset selection that have not been discussed by the algorithms community or the numerical linear 
algebra community. These results are phrased in terms of the stable rank of a matrix: 

||A|| 2 

st. rank(A) = ^ 

ll^ll 

where ||-|| F is the Frobenius norm and ||-|| is the spectral norm. The stable rank can be viewed as 
an analytic surrogate for the algebraic rank. Indeed, express the two norms in terms of singular 
values to obtain the relation 

st.rank(A) < rank(A). 

In this bound, equality occurs (for example) when the columns of A are identical or when the 
columns of A are orthonormal. As we will see, the stable rank is tightly connected with the 
number of (strongly) linearly independent columns we can extract from a matrix. 

Before we continue, let us instate some notation. We say that a matrix is standardized when 
its columns have unit £2 norm. The jth column of a matrix A is denoted by aj. For a subset r 
of column indices, we write A T for the column submatrix indexed by r. Likewise, given a square 
matrix H, the notation H TXT refers to the principal submatrix whose rows and columns are listed 
in r. The pseudoinverse of a diagonal matrix D is formed by reciprocating the nonzero entries. 
As usual, we write ||-|| for the £ p vector norm. The condition number of a matrix is the quantity 

/ A \ II -^^-'^ 1 1 II II II II -1 

M A) = max < -— — — : \\x\\ = WvWn = 1 
I ) \\\Ay\\ 2 11 112 l|y " 2 

Finally, upright letters (c, C, K, . . . ) refer to positive, universal constants that may change from 
appearance to appearance. 

The first theorem, due to Kashin and Tzafriri, shows that each matrix with standardized columns 
contains a large column submatrix that has small spectral norm [VerOl, Thm. 2.5]. 

Theorem 1.1 (Kashin-Tzafriri). Suppose A is standardized. Then there is a set r of column 
indices for which 

\t\ > st.rank(A) and ||A r || < C. 

In fact, much more is true. Combining Theorem 1.1 with the celebrated restricted invertibility 
result of Bourgain and Tzafriri [BT87, Thm. 1.2], we find that every standardized matrix contains 
a large column submatrix whose condition number is small. 

Theorem 1.2 (Bougain-Tzafriri) . Suppose A is standardized. Then there is a set r of column 
indices for which 

\t\ > c • st. rank(A) and k(A t ) < v3. 

Theorem 1.2 yields the best general result [BT91, Thm. 1.1] on the Kadison-Singer conjecture, 
the major open question in operator theory. To display its strength, let us consider two extreme 
examples. 

(1) When A has identical columns, every collection of two or more columns is singular. Theo- 
rem 1.2 guarantees a well-conditioned submatrix A T with |r| = 1, which is optimal. 

(2) When A has n orthonormal columns, the full matrix is perfectly conditioned. Theorem 1.2 
guarantees a well-conditioned submatrix A T with |r| > cn, which lies within a constant 
factor of optimal. 
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The stable rank allows Theorem 1.2 to interpolate between the two extremes. Subsequent research 
established that the stable rank is intrinsic to the problem of finding well-conditioned submatrices. 
We postpone a more detailed discussion of this point until Section 6. 

1.1. Contributions. Although Theorems 1.1 and 1.2 would be very useful in computational ap- 
plications, we cannot regard current proofs as constructive. The goal of this paper is to establish 
the following novel, algorithmic claim. 

Theorem 1.3. There are randomized, polynomial-time algorithms for producing the sets guaranteed 
by Theorem 1.1 and by Theorem 1.2. 

This result is significant because no known algorithm for column subset selection is guaranteed 
to produce a submatrix whose condition number has constant order. See [BDM08] for a recent 
overview of that literature. The present work has other ramifications with independent interest. 

• We develop algorithms for computing the matrix factorizations of Pietsch and Grothendieck, 
which are regarded as basic instruments in modern functional analysis [Pis86]. 

• The methods for computing these factorizations lead to new approximation algorithms for 
two NP-hard matrix norms. (See Remarks 3.2 and 5.6.) 

• We identify an intriguing connection between Pietsch factorization and the maxcut semi- 
definite program [GW95]. 

1.2. Overview. We focus on the algorithmic version of the Kashin-Tzafriri theorem because it 
highlights all the essential concepts while minimizing irrelevant details. Section 2 outlines a proof of 
this result, emphasizing where new algorithmic machinery is required. The missing link turns out 
to be a computational method for producing a certain matrix factorization. Section 3 reformulates 
the factorization problem as an eigenvalue minimization, which can be completed with standard 
techniques. In Section 4, we exhibit a randomized algorithm that delivers the submatrix promised 
by Kashin-Tzafriri. In Section 5, we traverse a similar route to develop an algorithmic version of 
Bourgain-Tzafriri. Section 6 provides more details about the stable rank and describes directions 
for future work. Appendix A contains some key estimates on the norms of random submatrices, 
and Appendix B outlines a simple computational procedure for solving the eigenvalue optimization 
problems that arise in our work. 

2. The Kashin-Tzafriri Theorem 

The proof of the Kashin-Tzafriri theorem proceeds in two steps. First, we select a random set 
of columns with appropriate cardinality. Second, we use a matrix factorization to identify and 
remove redundant columns that inflate the spectral norm. The proof gives strong hints about how 
a computational procedure might work, even though it is not constructive. 

2.1. Intuitions. We would like to think that a random submatrix inherits its share of the norm 
of the entire matrix. In other words, if we were to select a tenth of the columns, we might hope to 
reduce the norm by a factor of ten. Unfortunately, this intuition is meretricious. 

Indeed, random selection does not necessarily reduce the spectral norm at all. The essential 
reason emerges when we consider the "double identity," the m x 2m matrix A = [I | I] . Suppose 
we draw s random columns from A without replacement. The probability that all s columns are 
distinct is 

x *HZ± x ... x 2m ~ 2 / S " < FT 1 (l - « exp (- V s " 1 ±\ « e-* 2 /- 

2m - 1 2m -2 2m - (s - 1) _ 1J -j=o \ 2m J v \ ^j=o 2m J 

Therefore, when s = Q(y/m), sampling almost always produces a submatrix with at least one 
duplicated column. A duplicated column means that the norm of the submatrix is y/2, which 
equals the norm of the full matrix, so no reduction takes place. 
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Nevertheless, a randomly chosen set of columns from a standardized matrix typically contains 
a large set of columns that has small norm. We will see that the desired subset is exposed by 
factoring the random submatrix. This factorization, which was invented by Pietsch, is regarded as 
a basic instrument in modern functional analysis. 

2.2. The (oo, 2) operator norm. Although sampling does not necessarily reduce the spectral 
norm, it often reduces other matrix norms. Define the natural norm on linear operators from £qq 
to ii via the expression 

ll-Blloo— 3 = max{||Ba;|| 2 : = 1}. 

An immediate consequence is that H-Bjl^^ < y/s \\B\\ for each matrix B with s columns. Equality 
can obtain in this bound. 

The exact calculation of the (oo,2) operator norm is computationally difficult. Results of 
Rohn [RohOO] imply that there is a class of positive semidefmite matrices for which it is NP- 
hard to estimate ||-|| 00 _ >2 w ithin an absolute tolerance. Nevertheless, we will see that the norm can 
be approximated in polynomial time up to a small relative error. (See Remark 3.2.) 

As we have intimated, the (oo, 2) norm can often be reduced by random selection. The following 
theorem requires some heavy lifting, which we delegate to Appendix A. 2. 

Theorem 2.1. Suppose A is a standardized matrix with n columns. Choose 

s < [2st.rank(A)~|, 

and draw a uniformly random subset a with cardinality s from {1, 2, . . . , n}. Then 
In particular, \\A a \\ — with probability at least 1/8. 

2.3. Pietsch factorization. We cannot exploit the bound in Theorem 2.1 unless we have a way 
to connect the (oo, 2) norm with the spectral norm. To that end, let us recall one of the landmark 
theorems of functional analysis. 

Theorem 2.2 (Pietsch Factorization). Each matrix B can be factored as B = TD where 

• D is a nonnegative, diagonal matrix with trace (D 2 ) = 1, and 

• 11*11^2 <||T||<K P IIBIL^. 

This result follows from the little Grothendieck theorem [Pis86, Sec. 5b] and the Pietsch factor- 
ization theorem [Pis86, Cor. 1.8]. The standard proof produces the factorization using an abstract 
separation argument that offers no algorithmic insight. The value of the constant is available. 

• When the scalar field is real, we have Kp(R) = y ir/2 ~ 1.25. 

• When the scalar field is complex, we have Kp(C) = \/4/ir « 1.13. 

A major application of Pietsch factorization is to identify a submatrix with controlled spectral 
norm. The following proposition describes the procedure. 

Proposition 2.3. Suppose B is a matrix with s columns. Then there is a set r of column indices 
for which 

s „ „ [2.. „ 

\t\ > - and \\B T \\ < Kp\/ - ll^lloo— 2 • 



Proof. Consider a Pietsch factorization B = TD, and define 

T = {j: 4 < 2/s}. 

Since X^fj = 1> Markov's inequality implies that \r\ > s/2. We may calculate that 

||B T || = \\TD T \\ < ||T|| • \\D T \\ < K P II-BI^^ • \[2~fs. 
This completes the proof. □ 
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2.4. Proof of Kashin Tzafriri. With these results at hand, we easily complete the proof of 
the Kashin-Tzafriri theorem. Suppose A is a standardized matrix with n columns. Assume that 
st. rank(A) < n/2. Otherwise, the spectral norm || A\\ < y/2, so we may select r = {1, 2, . . . , n}. 
According to Theorem 2.1, there is a subset a of column indices for which 

|c| > 2st.rank(A) and HA^H^^ < 8\Zfo[. 

Apply Proposition 2.3 to the matrix B = A a to obtain a subset r inside a for which 




Since B r = A T and Kp < y^r/2, these bounds reveal the advertised conclusion: 

|r| >st.rank(A) and ||A r || < 15. 

At this point, we take a step back and notice that this proof is nearly algorithmic. It is straight- 
forward to perform the random selection described in Theorem 2.1. Provided that we know a 
Pietsch factorization of the matrix B, we can easily carry out the column selection of Proposi- 
tion 2.3. Therefore, we need only develop an algorithm for computing the Pietsch factorization to 
reach an effective version of the Kashin-Tzafriri theorem. 

3. Pietsch Factorization via Convex Optimization 

The main novelty is to demonstrate that we can produce a Pietsch factorization by solving a 
convex programming problem. Remarkably, the resulting optimization is the dual of the famous 
maxcut semidefinite program [GW95], for which many polynomial-time algorithms are available. 

3.1. Pietsch and eigenvalues. The next theorem, which serves as the basis for our computational 
method, demonstrates that Pietsch factorizations have an intimate relationship with the eigenvalues 
of a related matrix. In the sequel, we reserve the letter D for a nonnegative, diagonal matrix with 
trace(.D 2 ) = 1, and we write A max for the algebraically maximal eigenvalue of a Hermitian matrix. 

Theorem 3.1. The factorization B = TD satisfies \\T\\ < a if and only if D satisfies 

A max (S*S - a 2 D 2 ) < 0. 
In particular, if no D verifies this bound, then no factorization B = TD admits ||T|| < a. 
Proof. Assume B has a factorization B = TD with ||T|| < a. We have the chain of implications 
B = TD \\Bx\\l = ||T.D£c||2 Va; 

=^ \\Bx\\l < a 2 \\Dx\\l \/x 
=> x*B*Bx < a 2 x*D 2 x \fx 
x*(B*B - a 2 D 2 )x < \/x 
=> B*B - a 2 D 2 4 0, 

where denotes the semidefinite, or Lowner, ordering on Hermitian matrices. 
Conversely, assume we are provided the inequality 

B*B-a 2 D 2 4 0. (3.1) 

First, we claim that any zero entry in D corresponds with a zero column of B. To check this point, 
suppose that djj = for an index j. The relation (3.1) requires that 

> (B*B - a 2 D 2 )jj = b*bj. 
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This inequality is impossible unless bj = 0. To continue, set T = BD\ and observe that B = TD 
because the zero entries of D correspond with zero columns of B. Therefore, we may factor the 
diagonal matrix out from (3.1) to reach 

D(T*T - a 2 P)D 4 0. 

where the matrix P = DD^ is an orthogonal projector. Sylvester's theorem on inertia [HJ85, 
Thm. 4.5.8] ensures that T*T — a 2 P =4 0- Since P is a projector, this relation implies that 

T*T 4 a 2 P 4 a 2 I. 

We conclude that ||T|| < a. □ 

3.2. Factorization via optimization. Recall that the maximum eigenvalue is a convex function 
on the space of Hermitian matrices, so it can be minimized in polynomial time [L096]. We are led 
to consider the convex program 

min A max (S* J B - a 2 F) subject to trace(F) = 1, F diagonal, and F > 0. (3.2) 

Owing to Theorem 3.1, there exists a factorization B = TD with ||T|| < a if and only if the value 
of (3.2) is nonpositive. 

Now, if F is a feasible point of (3.2) with a nonpositive objective value, we can factorize 

B = TD with D = F 1/2 , T = BD\ and ||T|| < a. 

In fact, it is not necessary to solve (3.2) to optimality. Suppose B has s columns, and assume we 
have identified a feasible point F with a (positive) objective value r]. That is, 

A max (B*£? - a 2 F) < r,. 

Rearranging this relation, we reach 

B*B - (a 2 + T]s)F\ < where F = — ^— {a 2 F + rjl). 

J or + rjs 

Since F is positive and diagonal with trace(-F) = 1, we obtain the factorization 

B = TD with D = F 1/2 , T = BD 1 , and ||T|| < \J a 2 + r?s. 

To select a target value for the parameter a, we look to the proof of the Kashin-Tzafriri theorem. 
If B has s columns, then a = 8Kpy/s is an appropriate choice. Furthermore, since the argument 
only uses the bound ||T|| = 0(^/s), it suffices to solve (3.2) with precision r/ = 0(1). 

3.3. Other formulations. In a general setting, a target value for a is not likely to be available. 
Let us exhibit an alternative formulation of (3.2) that avoids this inconvenience. 

min X max (B*B - E) + traced) subject to E diagonal, E > 0. (3.3) 

Suppose a* is the minimal value of ||T|| achievable in any Pietsch factorization B = TD. It can be 
shown that a 2 is the value of (3.3) and that each optimizer E± satisfies trace(i£*) = a 2 . As such, 
we can construct an optimal Pietsch factorization from a minimizer: 

B = TD with D = (E+/ trace(^)) 1 / 2 , T = BD\ and ||T|| = a*. 

The dual of (3.3) is the semidefinite program 

max (B*B, Z) subject to diag(Z) = I and Z ^ 0. (3.4) 

This is the famous maxcut semidefinite program [GW95]. We find an unexpected connection 
between Pietsch factorization and the problem of partitioning nodes of a graph. 
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Given a dual optimum, we can easily construct a primal optimum by means of the complemen- 
tary slackness condition [Ali95, Thm. 2.10]. Indeed, each feasible optimal pair (E+,Z+) satisfies 
Z+(B*B — E+) = 0. Examining the diagonal elements of this matrix equation, we find that 

E* = diag(^) = diag(Z^) = diag(Z*B*5) 

owing to the constraint diag(^) = I. Obtaining a dual optimum from a primal optimum, however, 
requires more ingenuity. 

Remark 3.2. According to Theorem 2.2 and the discussion here, the optimal value of (3.3) over- 
estimates ||-B|||L_^2 by a multiplicative factor no greater than Kp 2 . As a result, the optimization 
problem (3.3) can be used to design an approximation algorithm for (oo, 2) norms. 

3.4. Algorithmic aspects. The purpose of this paper is not to rehash methods for solving a 
standard optimization problem, so we keep this discussion brief. It is easy to see that (3.2) can be 
framed as a (nonsmooth) convex optimization over the probability simplex. Appendix B outlines 
an elegant technique, called Entropic Mirror Descent [BT03], designed specifically for this class 
of problems. Although the EMD algorithm is (theoretically) not the most efficient approach to 
(3.2), preliminary experiments suggest that its empirical performance rivals more sophisticated 
techniques. 

For a concrete time bound, we refer to Alizadeh's work on primal-dual potential reduction 
methods for semidefinite programming [Ali95]. When B has dimension m x s, the cost of forming 
B*B is at most 0(s 2 m). Then the cost of solving (3.4) is no more than 0(s 3 ' 5 ), where the tilde 
indicates that log-like factors are suppressed. 

4. An Algorithm for Kashin-Tzafriri 

At this point, we have amassed the materiel necessary to deploy an algorithm that constructs the 
set r promised by the Kashin-Tzafriri theorem. The procedure appears on page 11 as Algorithm 1. 
The following result describes its performance. 

Theorem 4.1. Suppose A is an m x n standardized matrix. With probability at least 4/5, Algo- 
rithm 1 produces a set t = t+ of column indices for which 

\ T \ > — st.rank(A) and \\A T \\ < 15. 

The computational cost is bounded by 0(|r| 2 m + |r| 3 ' 5 ). 

Remarkably, Algorithm 1 is sublinear in the size of the matrix when st.rank(A) = o(n 1 / 3 5 ). 
Better methods for solving (3.2) would strengthen this bound. 

Proof. According to Section 2, the procedure Norm-Reduce has failure probability less than 7/8 
when s < 2st. rank(A). The probability the inner loop fails to produce an acceptable set t* of size 
s/2 is at most (7/8) 81og2 W. So the probability the algorithm fails before s > st. rank(A) is at most 

V°° (V 8 ) 16 rn o 

^=2 (7/8) - 1 - (7/8)8 < °- 2 - 

With constant probability, we obtain a set r* with cardinality at least st. rank(A)/2. 

The cost of the procedure Norm-Reduce is dominated by the cost of the Pietsch factorization, 
which is 0(s 2 m + s 3 ' 5 ) for a fixed s. Summing over s and k, we find that the total cost of all 
the invocations of Norm-Reduce is dominated (up to logarithmic factors) by the cost of the final 
invocation, during which the parameter s < 2 \ tJ. 

An estimate of the spectral norm of A T can be obtained as a by-product of solving (3.2). Indeed, 
Proposition 2.3 and the discussion in Section 3.2 show that we can bound the spectral norm in 
terms of the parameter a and the objective value obtained in (3.2). □ 
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5. The Bourgain-Tzafriri Theorem 

Our proof of the Bourgain-Tzafriri theorem is almost identical in structure with the proof of 
the Kashin-Tzafriri theorem. This streamlined argument appears to be simpler than all previ- 
ously published approaches, but it contains no significant conceptual innovations. Our discussion 
culminates in an algorithm remarkably similar to Algorithm 1. 

5.1. Preliminary results. Suppose A is a standardized matrix with n columns. We will work 
instead with a related matrix H = A* A— I, which is called the hollow Gram matrix. The advantage 
of considering the hollow Gram matrix is that we can perform column selection on A simply by 
reducing the norm of H . 

Proposition 5.1. Suppose A is a standardized matrix with hollow Gram matrix H. If r is a set 

of column indices for which \\H TXT \\ < 0.5, then k(A t ) < \/3- 

Proof. The hypothesis \\H TXT \\ < 0.5 implies that the eigenvalues of H TXT lie in the range [—0.5, 0.5]. 
Since H TXT = A*A T — I, the eigenvalues of A*A T fall in the interval [0.5, 1.5]. An equivalent con- 

1 1 2 

dition is that 0.5 < ||A T £c|| 2 < 1.5 whenever \\x\\2 = 1. We conclude that 

k(A t ) = max / jj — - — jj^ : II a; II « = Ill/Ho = 
V ' \\\A T y\\ 2 11 112 " y " 2 

Thus, a norm bound for H TXT yields a condition number bound for A T . □ 

As we mentioned before, random selection may reduce other norms even if it does not reduce 
the spectral norm. Define the natural norm on linear maps from to i\ by the formula 

HGIL^i = maxfllGxIli : \x\^ = 1}. 

This norm is closely related to the cut norm, which plays a starring role in graph theory [AN04]. 
For a general s x s matrix G, the best inequality between the (oo, 1) norm and the spectral norm 
* s Halloo— >i — R- orm [RohOO] has established that there is a class of positive semidefinite, 

integer matrices for which it is NP-hard to determine the (oo, 1) norm within an absolute tolerance of 
1/2. Nevertheless, it can be approximated within a small relative factor in polynomial time [AN04]. 

The (oo, 1) norm decreases when we randomly sample a principal submatrix. The following 
result, which we establish in Appendix A. 4, is a direct consequence of Rudelson and Vershynin's 
work on the cut norm of random submatrices [RV07, Thm. 1.5]. 

Theorem 5.2. Suppose A is an n-column standardized matrix with hollow Gram matrix H. Choose 

s < |~c • st. rank(A)] , 

and draw a uniformly random subset a with cardinality s from {1,2, ... ,n}. Then 

EII-H^xo-lloo— l ^ g- 

In particular, ll-Ho-xo-lloo-vi — s /8 with probability at least 1/9. 

To connect the (oo, 1) norm with the spectral norm, we call on the celebrated factorization of 
Grothendieck [Pis86, p. 56]. 

Theorem 5.3 (Grothendieck Factorization). Each matrix G can be factored as G = D\TE>2 where 

(1) Di is a nonnegative, diagonal matrix with trace(-D?) = 1 fori = 1,2, and 

(2) [I^l^iitii^KgIigil^. 

When G is Hermitian, we may take D\ = D 2 . 

The precise value of the Grothendieck constant Kq remains an outstanding open question, but 
it is known to depend on the scalar field [Pis86, Sec. 5e]. 



COLUMN SUBSET SELECTION 



• When the scalar field is real, 1.570 < vr/2 < K G (R) < vr/(21og(l + ^2)) < 1.783. 

• When the scalar field is complex, 1.338 < K G (C) < 1.405. 

For positive semidefinite G, the real (resp., complex) Grothendieck constant equals the square of 
the real (resp., complex) Pietsch constant because 



oo^l 



LB I 



The following proposition describes the role of the Grothendieck factorization in the selection of 
submatrices with controlled spectral norm. 

Proposition 5.4. Suppose G is an s x s Hermitian matrix. There is a set r of column indices for 
which 



\t\ > — and 
i i - 2 



IGvxrll ^ 



2K 



G 



id 



oo— >1 ■ 



Proof. Consider a Grothendieck factorization G = DTD, and identify r 
remaining details echo the proof of Proposition 2.3. 



{j : d) 3 < s/2}. The 



□ 



5.2. Proof of Bourgain Tzafriri. Suppose A is a standardized matrix with n columns, and 
consider its hollow Gram matrix H. Theorem 5.2 provides a set a for which 



\M II <£ 



\o~\ > c • st. rank(A) and 

Apply Proposition 5.4 to the s x s matrix G = H aXa to obtain a further subset r inside a with 

2K G 



t > — and 
i i - 2 



\G T> 



< 



\G\ 



oo— >1 ■ 



Since 2K G < 4 and H T 



-. G tXt , we determine that 
\t\ > — ■ st.rank(A) and 



IHtxtII < 0.5. 



In view of Proposition 5.1, we conclude k(A t ) < 

Now, take another step back and notice that this here argument is nearly algorithmic. The 
random selection of a can easily be implemented in practice, even though the proof does not 
specify the value of c. Given a Grothendieck factorization G = DTD, it is straightforward to 
identify the subset r. The challenge, as before, is to produce the factorization. 

5.3. Grothendieck factorization via convex optimization. As with the Pietsch factorization, 
the Grothendieck factorization can be identified from the solution to a convex program. 



Theorem 5.5. Suppose G is Hermitian. The factorization G 
only if D satisfies 

-aD 2 G 



A, 



< 0. 



DTD satisfies \\T\\ < a if and 

(5.1) 



< a. 



G -aD 2 

In particular, if no D verifies this bound, then no factorization G = DTD admits \\T 

Proof. To check the forward implication, we essentially repeat the argument we used in Theorem 3.1 
for the Pietsch case. This reasoning yields the pair of relations 

G - aD 2 4 and - G - aD 2 4 0. 

Together, these two relations are equivalent with (5.1) because 



-aD 2 
G 



G 


1 


I I" 


* - 


aD 2 


~ 2 


-I I 





G-aD 2 



G-aD 2 



I I 

-I I 



To prove the reverse implication, we assume that (5.1) holds. First, we must check that djj 
implies that gj = 0. To verify this claim, observe that 





> 



9., 



-aD 2 



a [2\\9j\\ 2 -9jD 9j) > a||ffjll 2 
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because trace(-D 2 ) = 1. Therefore, we may construct a Grothendieck factorization G = DTD 
with ||T|| < a by setting T = D^GD^. □ 

This discussion leads us to frame the eigenvalue minimization problem 
-aF G 



min A r 



G -aF 



subject to trace(F) = 1, F diagonal, F > (5.2) 



Owing to Theorem 5.5, there is a factorization G = DTD with ||T|| < a if and only if the value 
of (5.2) is nonpositive. 

As in Section 3.2, we can easily construct Grothendieck factorizations from (imprecise) solutions 
to the problem (5.2). The proof of Bourgain-Tzafriri suggests that an appropriate value for the 
parameter a = s/4. Furthermore, we do not need to solve (5.2) to optimality to obtain the required 
information. Indeed, it suffices to produce a feasible point with an objective value of 0(1). 

To solve (5.2) in practice, we again propose the Entropic Mirror Descent algorithm [BT03]. 
Appendix B describes the application to this problem. To provide a concrete bound on the com- 
putational cost, we remark that, when A T has dimension m x s, forming G = A*A T — I costs at 
most 0(s 2 m), and Alizadeh's interior-point method [Ali95] requires 0(s 3 ' 5 ) time. 

Remark 5.6. For symmetric G, Theorem 5.3 shows that the norm ||G|| 00 _ +1 is approximated 
within a factor Kq by the least a for which (5.2) has a nonpositive value. A natural reformulation 
of (5.2) can identify this value of a automatically (cf. Section 3.3). For nonsymmetric G, similar 
optimization problems arise. These ideas yield new approximation algorithms for the (oo, 1) norm. 

5.4. An algorithm for Bourgain-Tzafriri. We are prepared to state our algorithm for produc- 
ing the set r described by the Bourgain-Tzafriri theorem. The procedure appears as Algorithm 2 
on page 11. Note the striking similarity with Algorithm 1. The following result describes the 
performance of the algorithm. We omit the proof, which parallels that of Theorem 4.1. 

Theorem 5.7. Suppose A is an m x n standardized matrix. With probability at least 3/4, Algo- 
rithm 2 produces a set r = r* of column indices for which 

\t\ > c ■ st.rank(A) and k(A t ) < \/3. 

The computational cost is bounded by 0(|r| 2 m + |t| 3 ' 5 ). 

6. Future Directions 

After the initial work [BT87], additional research has clarified the role of the stable rank. We 
highlight a positive result of Vershynin [VerOl, Cor. 7.1] and a negative result of Szarek [Sza90, 
Thm. 1.2] which together imply that the stable rank describes precisely how large a well-conditioned 
column submatrix can in general exist. See [VerOl, Sec. 5] for a more detailed discussion. 

Theorem 6.1 (Vershynin 2001). Fix e > 0. For each matrix A, there is a set r of column indices 
for which 

\t\ > (1 — e) ■ St. rank(A) and k{A t ) < C(e). 

Theorem 6.2 (Szarek). There is a sequence {A(n)} of matrices of increasing dimension for which 

\t\ = st.rank(A) =^ k(A t )=lu(1). 

Vershynin's proof constructs the set r in Theorem 6.1 with a complicated iteration that in- 
terleaves the Kashin-Tzafriri theorem and the Bourgain-Tzafriri theorem. We believe that the 
argument can be simplified substantially and developed into a column selection algorithm. This 
achievement might lead to a new method for performing rank-revealing factorizations, which could 
have a significant impact on the practice of numerical linear algebra. 
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Algorithm 1: Constructive version of Kashin-Tzafriri Theorem 



KT(A) 




Input: btandardrzed matrix A with n columns 




Output: A subset t* or {1, z, . . . , n\ 




Description: Produces such that r* > st.rank(A)/2 and \\A T \\ < 15 


w.p. 4/5 


i r* = {l} 




2 for s = 4, 8, 16, . . . , n 




3 for k = 1, 2, 3, . . . , 8 log 2 s 




4 r = Norm-Reduce(A, s) 




11 ^±7- ^ lO Lllt!ll / — / cLllQ Ul (Jd.1V 




6 if Tt < s then exit 




Norm-Reduce(A, s) 




Input: Standardized matrix A with n columns, a parameter s 




Output: A subset r of {1, 2, . . . , n} 




l Draw a uniformly random set a with cardinality s from {1,2, ... ,n} 




2 Solve (3.2) with B = A„ and a = SKpy^ to obtain a factorization B 


= T.D 


3 Return t = {j £ a : c& < 2/s} 





Algorithm 2: Constructive version of Bourgain-Tzafriri Theorem 



BT(A) 




Input: Standardized matrix A with n columns 




Output: A subset r* of {1,2,..., n} 




Description: Produces t* such that r* > st.rank(A)/2 and k(A t ) < \/3 


w.p. 3/4 


i r* = {l} 




2 for s = 4, 8, 16, . . . , n 




3 for k = 1, 2, 3, . . . , 8 log 2 s 




4 r = Cond-Reduce(A, s) 




5 if k(A t ) < \/3 then T* = r and break 




6 if |t*| < s then exit 




Cond-Reduce(A, s) 




Input: Standardized matrix A with n columns, a parameter s 




Output: A subset r of {1, 2, . . . , n} 




l Draw a uniformly random set a with cardinality s from {1, 2, . . . , n} 




2 Solve (5.2) with G = A* A a — I and a = s/4 to obtain factorization G 


= DTD 


3 Return r = {j £ a : djj < 2/s} 
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Appendix A. Random Reduction of Norms 

How does the norm of a matrix change when we pass to a random submatrix? This question 
has great importance in modern functional analysis, but it also has implications for the design of 
algorithms. This appendix describes some general results on how random selection reduces the 
(oo, 2) norm and the (oo, 1) norm. We also specialize these results to the structured matrices that 
appear in the proofs of Theorem 1.1 and Theorem 1.2. 

A.l. Random Coordinate Models. We begin with two standard models for selecting random 
submatrices, and we describe how these models are related for an important class of matrix norms. 

A matrix norm is monotonic if the norm of a matrix exceeds the norm of every (rectangular) 
submatrix. More precisely, the norm |||-||| is monotonic if 

||PAP'||| < I A||| 

for each matrix A and each pair P, P' of diagonal (i.e., coordinate) projectors. The basic example 
of a monotonic matrix norm is the natural norm on operators from t p to £ q with p, q in [1, oo], 
which is defined as 

\\ A \\p-*q = max {ll A:K llg : \\ X \\ P = !}• 

Fix a number 5 in [0,1], and denote by Pg a random n x n diagonal matrix where exactly 
s = [Sn\ entries equal one and the rest equal zero. This matrix can be viewed as a projector onto a 
random set of s coordinates. Therefore, we may treat AP$ as a random s-column submatrix of A 
by ignoring the zeroed columns. Although this model is conceptually appealing, it can be difficult 
to analyze because of the dependencies among coordinates. 

Let us introduce a simpler model for selecting random coordinates. We denote by R$ a random 
n x n diagonal matrix whose entries are independent 0-1 random variables with common mean 5. 
This matrix is a projector onto a random set of coordinates with average cardinality Sn. 

There is a basic result connecting these two models. The statement here follows directly from 
the argument in [Tro08, Lem. 14]. 

Proposition A.l (Poissonization). Let |||-||| be a monotonic matrix norm. For each matrix A with 
n columns, it holds that 

E I APs I < 2 E I Ails I • 

For each n x n matrix H, it holds that 

E\\\P S HP S \\\ < 2E|||P 5 ii"P 5 ||| . 

A. 2. Reduction of the (oo,2) norm. We begin with a general result on the (oo,2) norm of a 
uniformly random set of columns drawn from a fixed matrix. The basic argument appears already 
in the work of Bourgain and Tzafriri [BT91, Thm. 1.1], but modern proofs are a little simpler. 
(See [Ver06, Lem. 2.3], for example.) The version here offers especially good constants. 

Theorem A. 2. Fix 5 G [0, 1], and suppose A is a matrix with n columns. Then 

E WARsW^ < y/26(l-5) \\A\\ F + 5 HA^ . 

We postpone the argument to the next section so we may note a corollary that appears as a key 
step in the proof of the Kashin-Tzafriri theorem. 

Corollary A. 3. Suppose A is a standardized matrix with n columns. Choose s < |~2 st. rank(A)] , 
and write 5 = s/n. Then 

KWAPsW^ < 7y/s~. 
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Proof. Owing to the standardization, 1 < st.rank(A) = n/ ||A|| 2 . It follows that 

< 2st.rank(A) + 1 < 3st.rank(A) _ 3 



n n ||A|| 2 

Apply the Poissonization result, Proposition A.l, to see that 

E\\AP S \\ 00 ^ 2 <2E\\AR S \\ 00 ^. 

Theorem A. 2 yields 

E WAP.W^ < 2V25 || A|| F + 25 IIAH^ . 
Since A has n unit-norm columns, it holds that ||A|| F = y/ri. We also have the general bound 
||-A|loo— >2 < V^II-^-ll- Therefore, 

EHAP.511^^2 < 2v / 25n + 2£ v / n||A|| = 2^i y/2 + y/5\\A\ 

Introduce the bound on 8 and make a numerical estimate to complete the proof. □ 

A. 3. Proof of Theorem A. 2. We must bound the quantity 

E = E\\AR 5 \\ 00 ^ 2 . 

It turns out that it is easier to work with the (2, 1) norm, which is dual to the (00, 2) norm, because 
there are some special methods that apply. Rewrite the expression as 

En 
Sj\(aj, as) I 

||u.-|| 2 — -1 3 — 1 

where {Sj} is a sequence of independent 0-1 random variables with common mean 8. In the sequel, 
we simplify notation by omitting the restriction on the vector x and the limits from the sum. 

The next step is to center and symmetrize the selectors. First, add and subtract the mean of 
each term from the sum and use the subadditivity of the maximum to obtain 

£ < Emax V (Sj — 8) |(a 7 -, :r)|-|-maxV^ 8\(a,j, x)\ 

X 'j X 

= Emax .(Sj - S) \(a j} x)\ + 5 \\A*\\ 2 _^ 1 
= EmaxV (Sj - S) \{aj, x)\ + 8 ||A|| 00 ^ 2 . 

X J 

We focus on the first term, which we abbreviate by the letter F. Let {Sj} be an independent copy 
of the sequence {Sj}. Jensen's inequality allows that 

F = EmaxV.^ -E5'j)\(aj, x}\ 

X J 

< E max (Sj — Sj )\(dj, x) \ . 

X J 

Observe that {Sj — Sj} is a sequence of independent, symmetric random variables. Thus, we may 
multiply each one by a random sign without changing the expectation [LT91, Lem. 6.3]. That is, 

F < Emax V.e,(<5j - S'A |(a,-, x)\ 

X — J j ■* 

where {sj} is a sequence of independent Rademacher (i.e., uniform ±1) random variables. 

Now, we invoke a specific type of Rademacher comparison [LT91, Thm. 4.12 et seq.] to remove 
the absolute values from the inner product: 



F < EmaxV £ j($j ~ $j) ( a ji x ) = Emax (T^ £ j($j ~ 8j)a,j, x 

X J X \ J 

Since x ranges over the £2 unit sphere, we reach 

F<e||V Bj(5j-S'j)aj 
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The remaining expectations are elementary. First, apply Holder's inequality to obtain 



F ^( K \\L, £ ^- 5 >4l) 



1/2 



Compute the expectation with respect to {sj} and then with respect to {5j} and {Sj}. 



F<(E£.(^-^) 2 Ha 



2 



1/2 



J^.2<f(l - 5) \\a 



2 

.7112 



1/2 



y/2S(l - 5) \\A\ 



F ' 



Introduce this bound on F into the bound on E to conclude that 

E IIAPtfU^a < y/25(l-S) \\A\\ ¥ + 5 || . 
This is the advertised estimate. 

A. 4. Reduction of the (oo, 1) norm. The impact of random selection on the (oo, 1) norm has 
already received some attention in the theoretical computer science literature because of a connec- 
tion with graph cuts. The following result of Rudelson and Vershynin contains detailed information 
on the (oo, 1) norm of a random principal submatrix. The statement involves an auxiliary norm 



lulled = Ejl ffe ill2» 



where {e^} is the set of standard basis vectors. In words, we sum the Euclidean norms of the 
columns of the matrix. 

Theorem A. 4 (Rudelson- Vershynin). Fix 5 G [0, 1], and suppose H is an re x n matrix. Then 
E WRsHRsW^ < C U 2 \\H - di^H)^ + 5 3 / 2 (\\H\\ col + || col ) + S ||diag(J/ 



I oo— >1 



Theorem A. 4 is established with the same methods as Theorem A. 2, along with an additional 
decoupling argument [BT91, Prop. 1.9]. We rely on the following corollary in our proof of the 
Bourgain-Tzafriri theorem. 

Corollary A. 5. Suppose A is an n-column standardized matrix with hollow Gram matrix H = 
A* A — I. Choose s < \c ■ st. rank(A)] , and write 5 = s/n. Then 

s 



KWPsHPsW^ 



< 
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Proof. Suppose A is a standardized matrix with n columns, and define its n x n hollow Gram 
matrix H. Observe that the (oo, 1) norm of H satisfies the bound 

,i < n \\H\\ < nmax{||A*A|| - 1,1} < re ||A|| 2 . 



I HI 



Meanwhile, the 



I col 



norm satisfies 



|H|| col < ||A*A|| col 



Y,M*^\\ 2 < n \\ A \ 



These facts play a central role in the calculation. 

To continue, invoke the Poissonization result, Proposition A.l, which yields 



E\\P S HP 5 \ 
Theorem A. 4 provides that 



tl < 2 \\RsHRs\ 



oo— >1 ' 



'^IIHIL^ + ^HHI 



col 



where we have applied the facts that H is Hermitian and has a zero diagonal. The two norm 
bounds result in additional simplifications: 



\PsHP s \ 



>i<C 



8 2 n\\A\\ 2 + 5 3/2 n\\A\ 



Cs 



(5||A|| 2 + 5 1/2 ||A| 
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Since A has unit-norm columns, st. rank(A) = nj ||A|| 2 . As a result, 6 = s/n < c/ || A|| 2 . By fixing 
a sufficiently small constant c, we can ensure that 

WPsHPsW^ < S -, 

the advertised bound. □ 



Appendix B. Entropic Mirror Descent 

The algorithms for the Kashin-Tzafriri theorem and the Bourgain-Tzafriri theorem both require 
the solution to a convex minimization problem over the probability simplex. It is important to 
have a practical algorithm for approaching these optimizations. To that end, we briefly describe a 
simple, elegant method called Entropic Mirror Descent [BT03]. We then explain how to apply this 
technique to the specific objective functions that arise in our work. 

B.l. Convex analysis. Let E be a Euclidean space, i.e., a vector space equipped with a real-linear 
inner product. Let SI be a convex subset of E, and consider a convex function J : Q — > R. The 
subdifferential dJ(f) contains each vector G E* that satisfies the inequalities 

J(h) - J(f) > (0, h- f) for all h G 0. 

The elements of the subdifferential are called subgradients. They describe the directions and rates 
of ascent of the function J at the point /. When J is differentiate at /, the gradient is the unique 
subgradient. 

The Lipschitz constant of the function J with respect to a norm |||-||| is defined to be the least 
number L for which 

\J(h) - J(f)\ < L \\h - /I for all h, f G tt. 
It can be shown [Roc70, Thm. 24.7] that 

L = sup{||0|„:0e &/(/), fen}. 

where |||-|||^ is the dual norm. 

B.2. Interior subgradient methods. Consider the (nonsmooth) convex program 

min J(/) subject to / S fi. 

Subgradient information can be used to solve this problem, but caution is necessary because the 
negative subgradient is not necessarily a direction of descent. As a result, subgradient methods are 
typically nonmonotone, which means that the value of the objective function can (and often will) 
increase. It is also common for subgradient methods to produce iterates outside the constraint set. 
The classical remedy is to project each iterate back onto the constraint set. This idea succeeds, 
but it leads to zigzagging phenomena. 

Interior subgradient methods [BT03] are designed to eliminate some of the problematic behavior 
that projected subgradient methods exhibit. To develop an interior subgradient method, we need a 
divergence measure that is tailored to the constraint set. At each iteration, we perform two steps: 

(1) At the current iterate /, compute a subgradient 6 G dJ(f) to linearize the objective 
function: 

J(h)*J(f) + (0, h-f) 

(2) Penalize the linearization with the divergence D(-; f) from the current iterate, scaled by a 
(large) parameter /3 — 1 . Minimize this auxiliary function to produce a new iterate /': 

/' G arg min { J(f) + (0,h-f) + /T 1 ^; /)} • 
hen 
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Algorithm 3: Entropic Mirror Descent 



Emd(J, s, T) 

Input: Objective function J, dimension s, number T of iterations 
Output: Approximate minimizer / of J 



1 /(i) = s -i e 

2 for t = 1 to T 

3 Find E <9J(/ W ) 

5 h = /W • exp(-/30) 

6 /(*+!) = h/trace(fe) 

7 end /or 

8 Return / £ arg min t J(f^) 



{ Initialize with uniform density } 

{ Compute subgradient } 
{ Compute step size } 

{ Reweight current iterate } 
{ Rescale to obtain next iterate } 



The divergence penalty serves two purposes. First, it ensures that the next iterate is close to 
the previous iterate, which is essential because the linearization is only useful locally. Second, it 
simultaneously prevents the iterates from getting too close to the boundary of the constraint set. 
With a careful choice of the parameter (3, we can guarantee progress toward the optimum set, at 
least on average. 

B.3. Optimization on the probability simplex. The Entropic Mirror Descent (EMD) algo- 
rithm of Beck and Teboulle [BT03] is a specific instance of the interior subgradient method that is 
designed for minimizing convex functions over the probability simplex, the set defined by 

A s = {/ £ R s : trace(/) = 1, / > 0}. 

A natural divergence measure for this set is the relative entropy function: 



£(fc;/) = £* = />g (|) 



An amazing feature of the resulting interior subgradient method is that the optimization in the 
second step has a closed form: 

/ exp( ^ ) fo 
Ej fj exp(-/30,) 

Algorithm 3 describes the procedure that arises from these choices. Beck and Teboulle have estab- 
lished an elegant efficiency estimate [BT03, Thm. 4.2] for this method. 

Theorem B.l (Efficiency of EMD). Let J : A s — > M be a convex function whose Lipschitz constant 
with respect to the l\ norm is L. The approximate minimizer f generated by Algorithm 3 satisfies 



where /* is a minimizer of J. 

Algorithm 3 succeeds with a wide range of step sizes. In particular, when the total number T of 
iterations is unknown, we may compute the step size using the current iteration number t: 



I 2 log s 

tPWlo' 



This choice increases the right-hand side of the efficiency estimate by a logarithmic factor. 
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B.4. Pietsch factorization via EMD. Suppose B is a matrix with s columns. We can rephrase 
the Pietsch factorization problem (3.2) as an optimization over the probability simplex. Define the 
linear operator 

diag : R 8 -> R sxs 

that maps vectors to diagonal matrices in the obvious way. We can write the convex program as 

min X max {B*B - a 2 diag(/)) subject to / G A s . (B.l) 

Abbreviate the objective function J : A s — > R. We can evidently apply EMD to complete the 
optimization once we find a way to compute subgradients. 

We use methods from the convex analysis of Hermitian matrices to determine the subdifferential 
of the objective function [Lew96]. Let A be an Hermitian matrix. Then 

<9A max (A) = conv{im* : Au = X max (A)u, ||u|| 2 = 1}. 

In words, the subdifferential of the maximum eigenvalue function at A is the convex hull of all 
rank-one projectors whose range lies in the top eigenspace of A. According to [Roc70, Thm. 23.9], 
we have 

8J(f) = {-a 2 diag*(0) : G d\ max (B*B - a 2 diag(/))}. 

where the adjoint map diag* : R sxs — ► R s extracts the diagonal of a matrix. In particular, we 
may construct a subgradient 6 G dJ(f) from a normalized maximal eigenvector u of the matrix 
B* B — a 2 diag(/) using the formula 

6 = —a 2 diag* (u u*) = — a 2 \u\ 2 

2 

where |-| denotes the componentwise squared magnitude of a vector. 

In summary, we can evaluate the objective function J(f) and simultaneously obtain a subgradient 
6 G dJ(f) from an eigenvector calculation plus some lower-order operations. Note that the standard 
methods for producing a single eigenvector, such as the Lanczos algorithm and its variants [GVL96, 
Ch. 9], require access to the matrix only through its action on vectors. It is therefore preferable in 
some settings — for example, when B is sparse — not to form the matrix B*B. 

Eigenvector computation is a primitive in every numerical linear package, so it is reasonable 
to assume that high-precision eigenvectors are available. In any case, slight variants of EMD will 
work with approximate subgradients, provided they are computed to sufficient precision. A simple 
analysis supporting this claim does not seem to be available in the optimization literature, but see 
[Kal07, Ch. 6] for related work. 

We can bound the Lipschitz constant of J with respect to the t\ norm just by considering 
subgradients of the form 6 = — a 2 \u\ 2 because their convex hull yields the complete subdifferential. 
Since the eigenvector u is normalized, 

Halloo = 0(2 max j l n j| 2 — 

we determine that the Lipschitz constant L < a 2 . According to Theorem B.l, the EMD algorithm 
ostensibly requires 0(a 4 ) iterations to deliver a solution to (B.l) with constant precision. In 
practice, far fewer iterations suffice. 

Remark B.2. The application of EMD to (B.l) closely resembles the multiplicative weights method 
[Kal07, Ch. 6] for solving the maxcut problem (3.4). Indeed, the two algorithms are substantially 
identical, except for the specific choice of step sizes and the method for constructing the final solution 
from the sequence of iterates. The efficiency estimates are also similar, except that the multiplicative 
weights method uses the widths of the constraints in lieu of the Lipschitz constant. EMD appears 
to be more effective in practice because it exploits the geometry of the problem more completely. 
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min A r 



subject to / € A s 



B.5. Grothendieck factorization via EMD. Suppose G is an s x s Hermitian matrix. The 
Grothendieck factorization problem (5.2) can be expressed as solving 

-adiag(/) G 

G -adiag(/)_ 

Abbreviate the objective function J : A s — > BL Once again, EMD is an appropriate technique. 

We may obtain subgradients using the same methods as before. Compute a normalized, maximal 
eigenvector of the matrix: 

-adiag(/) G 



J(f) 



where 



G -Qdiag(/) 
Then a subgradient 9 £ d,J(f) can be obtained from the formula 

9 



\u\\l + 



V 



1. 



\U\ + |D i 

The Lipschitz constant L < a, so the number of iterations of EMD is apparently 0(a 2 ). Of course, 
the eigenvector calculations can be streamlined by exploiting the structure of the matrix. 
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